This application is for the solution of the test which is about analyzing lse ftse50 dataset. I have put all the comments starting with "#" on those expression, which can be uncommented to see the results. Please run to code to execute each cell. The interactive charts are saved as html and will be sent along. They will not work on static python html file. The code to download the dataset creating in this application is provided in the last cells.

Answer to question 3b. If I will have a more time, I would like to do some general prediction on this data to know how future index will. Moreover, I would like to create a javascript application based on that. Using dc.js, I would like to create an interactive application and visual summary of this data as also shown on this site. https://dc-js.github.io/dc.js/.

First step is to import all packages to be used

In [71]:
import datetime
import pandas_datareader.data as wb
import pandas as pd
import numpy as np
import seaborn as sns #visualisation
import holoviews as hv
from holoviews import opts, dim
hv.extension('bokeh')
import panel as pn
import panel.widgets as pnw
import hvplot.pandas #noqa
from dateutil.relativedelta import relativedelta
import matplotlib.pyplot as plt
import matplotlib

Then get ticker data spredsheet

In [2]:
ticker_data = "ftse50lse.csv"
df_ticker = pd.read_csv(ticker_data)
tick = df_ticker.Ticker

Read data from yahoo finance and converting to pandas dataframe

In [3]:
p = wb.DataReader(tick, 'yahoo','2017-9-19','2021-9-01')

See overview of the downloaded data

In [92]:
p.axes

We are only interested in closing prices

In [5]:
df_close =p['Adj Close']
In [93]:
#len(p["Adj Close"]) #check the length of the data. returns number of rows

Functions to see the null values

In [94]:
print(df_close.isnull().values.any()) # returns true if there are null values in the dataframe
print(df_close.isnull().values.sum()) # returns total count of null values
#print(df_close.isnull().sum()) # returns count of null values with column numbers

Uncomment it and it will remove all the rows with the null values. Then I put it into a new dataframe.

Solution of 2(b) In this step, I have preserved columns (tickers) because the ratio of null value per column is very less. if column has more null values then in that case rows need to be preserved and columns need to be deleted. !A best practice normall all data scientists use. :)!

In [8]:
df_clean=df_close.dropna()
In [109]:
print(df_close.shape) # size of dataframe with null values
print(df_clean.shape) # size of dataframe without null values
print(df_clean.isnull().values.sum()) ## see no null values :)
In [10]:
df_clean.head() # a sneek peek into the dataset
Out[10]:
Symbols ULVR.L AZN.L HSBA.L DGE.L GSK.L RIO.L BATS.L BP.L RDSA.L RDSB.L ... NXT.L NWG.L PSN.L SKG.L OCDO.L MNDI.L ITRK.L HLMA.L ENT.L CRDA.L
Date
2017-09-19 3837.117188 4197.256836 593.825989 2272.705811 1173.718506 2836.630615 3622.547607 356.334686 1724.121826 1762.332275 ... 4495.945312 216.502228 1658.849487 1991.695557 296.100006 1672.000977 4491.386230 1044.183472 742.240234 3480.189697
2017-09-20 3816.069824 4177.914551 590.558777 2209.423584 1175.334961 2827.760742 3589.427002 358.912903 1743.948486 1778.511353 ... 4579.035645 217.180115 1678.875732 1979.314819 290.200012 1664.595581 4509.860840 1043.217407 747.892822 3476.474365
2017-09-21 3737.144531 4192.421387 589.415161 2203.959961 1186.651611 2794.298584 3557.476562 362.116119 1752.446167 1793.881836 ... 4592.583496 220.739075 1652.173950 1955.378052 285.000000 1665.418457 4477.529297 1038.387939 741.806213 3498.765381
2017-09-22 3739.336914 4318.585449 593.907654 2218.528809 1191.501587 2822.116699 3593.323242 364.537964 1772.272583 1813.296875 ... 4642.256836 222.264313 1658.182007 1937.219482 282.200012 1675.292480 4522.794922 1042.251709 757.730530 3487.620117
2017-09-25 3743.721436 4334.411133 589.088440 2218.528809 1196.352051 2770.512207 3573.062012 369.069336 1788.053345 1827.858398 ... 4691.930664 221.501678 1662.187378 1925.663818 283.899994 1670.355469 4510.785645 1051.910889 751.095459 3479.260742

5 rows × 50 columns

This dataset do all big calculations. It returns FTSE50 Index and partial values for other indexes.

In [11]:
#Function that gets values from ticker spreadsheet and returns them to following code
def get_shares(label):
    for index, row in df_ticker.iterrows():
          if label == row["Ticker"]:
            no_shares = row["Shares"]
            sector = row["Sector"]
            weight = row["Weight"]
    return(sector, weight, no_shares)

value = []
ind =[]
sector_indx = [] # storage of partially calculated stock indexes
weight_val = []
weight_indx=[]# “true” FTSE 50 Index level
ffa =[]
ffa_lst=[] # storage of partially calculated free float adjustment factor
for index, row in df_clean.iterrows():
    for i,j in row.items():
        sector, weight, no_shares = get_shares(i) 
        no_shares = int(no_shares.replace(",",""))
        val = j*no_shares*1
        value.append(val)
        weight = float(weight.replace("%",""))
        weight_val = j*weight
        ffa = weight_val/len(df_clean)
        ffa_lst.append([weight_val, index, i])
        sector_indx.append([sector, index, val])
    ind.append(sum(value)/len(df_clean))    
    weight_indx.append(sum(value))
    

This is the FTSE50 Index that was asked at 1(c). In the last cell of this application, there is a code to download all these datasets.

In [124]:
df_index = pd.DataFrame()
df_index = pd.DataFrame(zip(df_clean.index, ind), columns = ["date","index"])
print(df_index[0:10])
        date         index
0 2017-09-19  1.339637e+11
1 2017-09-20  2.677366e+11
2 2017-09-21  4.013677e+11
3 2017-09-22  5.358444e+11
4 2017-09-25  6.701572e+11
5 2017-09-26  8.042756e+11
6 2017-09-27  9.389508e+11
7 2017-09-28  1.073775e+12
8 2017-09-29  1.209579e+12
9 2017-10-02  1.346573e+12

This is sub indices for each sector as asked in 1d. The result is a multi index dataframe.

In [125]:
df_sectorindx = pd.DataFrame(sector_indx,columns = ["sector","date","value"])
df_sectorindx=df_sectorindx[['date','sector',"value"]].groupby(['date', "sector","value"]).sum()/len(df_clean)
df_sectorindx.head()
Out[125]:
date sector value
2017-09-19 Consumer 5.138137e+11
8.094096e+11
2.001342e+12
2.704679e+12
3.838911e+12
In [110]:
#print(len(weight_indx))

In the following first there is the true FTSE50 Index that was asked in 3(a) then the free floating adjustment factor.

In [126]:
df_wgt_index = pd.DataFrame(zip(df_clean.index, weight_indx), columns = ["date","index"])
print(df_wgt_index[0:10])
#This is free floating adjustment factor
df_ffa= pd.DataFrame(ffa_lst, columns = ["ffa","date","stock"])
df_ffa_gp=df_ffa.pivot_table(index ='date', columns =['stock'], values =['ffa']).reset_index()
#df_ffa_gp.head()
        date         index
0 2017-09-19  1.326241e+14
1 2017-09-20  2.650593e+14
2 2017-09-21  3.973541e+14
3 2017-09-22  5.304860e+14
4 2017-09-25  6.634557e+14
5 2017-09-26  7.962328e+14
6 2017-09-27  9.295613e+14
7 2017-09-28  1.063037e+15
8 2017-09-29  1.197483e+15
9 2017-10-02  1.333107e+15

Time series of main index as asked in 2a. It shows positive correlation. Over time indices has progressed.

In [127]:
df_index.hvplot.line(x="date", y="index") 
Out[127]:

Time series of sector index as asked in 2a. Suddenly finance sector dropped down after 3/2020. Consumer sector is growing up mid 2019. Health sector has progressed a lot from 2020 and is still high. It could be due to Corona.

In [128]:
df_sectorindx.hvplot.line(x = "date", y = "value", by ='sector')
Out[128]:

The same Time series of sector index as above but with more fine view and controls. The following code will save the chart as html for sharing or offline use.

In [129]:
df_sectorindx.hvplot.line(x = "date", y = "value", groupby ='sector')
plt_sectorindx = df_sectorindx.hvplot.line(x = "date", y = "value", groupby ='sector')
hv.save(plt_sectorindx, 'plt_sectorindx.html', backend='bokeh')
                                                                                                                       

Convert multiindex dataframe to single index for ease of use and visualise in next questions

In [130]:
df_sector = df_sectorindx.reset_index()
print(df_sector)
            date      sector         value
0     2017-09-19    Consumer  5.138137e+11
1     2017-09-19    Consumer  8.094096e+11
2     2017-09-19    Consumer  2.001342e+12
3     2017-09-19    Consumer  2.704679e+12
4     2017-09-19    Consumer  3.838911e+12
...          ...         ...           ...
49495 2021-09-01  Technology  1.152148e+12
49496 2021-09-01  Technology  1.176534e+12
49497 2021-09-01  Technology  1.465725e+12
49498 2021-09-01  Technology  3.272854e+12
49499 2021-09-01  Technology  4.075636e+12

[49500 rows x 3 columns]

Which sector performed better in the last 18 months? Question from 2b.

The health sector is best performing with some exception to consumer. The worst is Finance as it was higher before and now it is down. Retail on the other hand was always low.

In [131]:
end_date=df_sector["date"].max()
#print(end_date)
eighteen_m_ago = end_date - relativedelta(months=18) # date 18 months ago
#print(eighteen_m_ago)

#create a dataframe from this condition
df_eighteen_sector = df_sector[(df_sector["date"] >= eighteen_m_ago) & (df_sector["date"] <= end_date)]
#df_eighteen_sector.head()

#visualize it
df_eighteen_sector.hvplot.line(x = "date", y = "value", by ='sector')
Out[131]:

The same Time series of sector index as above but with more fine view and controls. Also in the following code will save the chart as html.

In [132]:
df_eighteen_sector.hvplot.line(x = "date", y = "value", groupby ='sector')
# The following code will save the chart as html
plt_eighteen_sector = df_eighteen_sector.hvplot.line(x = "date", y = "value", groupby ='sector')
hv.save(plt_eighteen_sector, 'plt_eighteen_sector.html', backend='bokeh')
                                                                                                                       

An interactive chart showing values for different years and sector. The code to save chart as html is also included in the following.

In [133]:
df_eighteen_sector.hvplot.bar(x = "date.month" , y = "value", groupby=['date.year', 'sector'] )
Out[133]:

Solution of 2c, calculation of rolling average and sd. Then creating a dataframe from them.

In [134]:
# The following code will save the chart as html
plt_eighteen_sector_bar = df_eighteen_sector.hvplot.bar(x = "date.month" , y = "value", groupby=['date.year', 'sector'] )
hv.save(plt_eighteen_sector_bar, 'plt_eighteen_sector_bar.html', backend='bokeh')
                                                                                                                       
In [135]:
#Rolling average for both indices
avg_indx_1 = df_index["index"].rolling(window=30).mean()
avg_indx_3 = df_index["index"].rolling(window=90).mean()
avg_indx_6 = df_index["index"].rolling(window=120).mean()
avg_sec_1 = df_sector["value"].rolling(window=30).mean()
avg_sec_3 = df_sector["value"].rolling(window=90).mean()
avg_sec_6 = df_sector["value"].rolling(window=120).mean()
In [136]:
#standard deviation for both indices
sd_indx_1 = df_index["index"].rolling(window=30).std()
sd_indx_3 = df_index["index"].rolling(window=90).std()
sd_indx_6 = df_index["index"].rolling(window=120).std()
sd_sec_1 = df_sector["value"].rolling(window=30).std()
sd_sec_3 = df_sector["value"].rolling(window=90).std()
sd_sec_6 = df_sector["value"].rolling(window=120).std()
In [137]:
#putting the above values in dataframe
df_avg=pd.DataFrame(zip(avg_sec_1,avg_sec_3,avg_sec_6), columns=["roll_avg_1","roll_avg_3","roll_avg_6"])
df_avg.head()
df_sd=pd.DataFrame(zip(sd_sec_1,sd_sec_3,sd_sec_6), columns=["sd_1","sd_3","sd_6"])
df_sd.head()
# remove na values
df_avg= df_avg.dropna()
df_sd= df_sd.dropna()

To change the date format for creating visualizations.

In [142]:
df_eighteen_sector['date'] = df_eighteen_sector["date"].dt.strftime("%m/%y")
print(df_eighteen_sector['date'])

Overview chart of performance of different sectors from last 18 months.

In [139]:
plt.figure(figsize=[20,7])
sns.barplot(x='date', y='value',  hue = "sector", data=df_eighteen_sector)
Out[139]:
<matplotlib.axes._subplots.AxesSubplot at 0x2b08951d6a0>

Solution to answer 2c for rolling average. Overlay chart is shown below.

In [140]:
bar = plt.figure(figsize=[30,5])
bar=sns.barplot(x='date', y='value',  hue = "sector", data=df_eighteen_sector)
bar.tick_params(axis='y')
line=bar.twiny()
line=sns.lineplot( data=df_avg, sort=False, linewidth = 0.3)
line.tick_params(axis='y')
plt.show()

Solution to answer 2c for standard deviation. Overlay chart is shown below.

In [141]:
# solution to answer 2c for sd
bar = plt.figure(figsize=[30,5])
bar=sns.barplot(x='date', y='value',  hue = "sector", data=df_eighteen_sector)
bar.tick_params(axis='y')
line=bar.twiny()
line=sns.lineplot( data=df_sd, sort=False, linewidth = 0.3)
line.tick_params(axis='y')
plt.show()
In [ ]:
# download files from all the above created datasets
df_index.to_csv("ftseindex.csv")
df_sectorindx.to_csv("sectorindex.csv")
df_wgt_indx.to_csv("wgt_indx.csv")
df_ffa_gp.to_csv("ffa.csv")